This paper introduces the Ubuntu Dialogue Corpus, a dataset containing almost1 million multi-turn dialogues, with a total of over 7 million utterances and100 million words. This provides a unique resource for research into buildingdialogue managers based on neural language models that can make use of largeamounts of unlabeled data. The dataset has both the multi-turn property ofconversations in the Dialog State Tracking Challenge datasets, and theunstructured nature of interactions from microblog services such as Twitter. Wealso describe two neural learning architectures suitable for analyzing thisdataset, and provide benchmark performance on the task of selecting the bestnext response.
展开▼